Copy Number Alterations (CNAs), including chromosomal gains and losses, shape the genetics and clinical management of acute myeloid leukemia (AML), particularly in complex karyotypes (e.g. CK-AML). Traditionally, AML associated CNAs have been studied using cytogenetic approaches, i.e., karyotyping, that are limited in resolution and sensitivity. Here, low-coverage, sparse whole genome sequencing followed by machine learning applied to over 600 AML patient samples (ECOG-ARCIN trials: E1900, E3999) reveals novel genetic, biologic, and clinical correlates in AML disease. We find that multi-variate Cox-regression analysis applied to the data identifies select patients exhibiting focal and cryptic hemizygous deletions at 17q, 21q, and 3p, in otherwise diploid genomes, associate with inferior survival compared to other patients (median overall survival; OS = 65.5 compared to 421 days, Chi-square test pval=6x10-11), independent of factors such as age, treatment, and karyotype complexity status. In addition, we confirm that amplifications at 11q23, encoding the KM2TA gene, are associated with inferior outcome, and further identify unique breakpoint event features as well as transcriptional signatures suggestive of distinctive KM2TA-translocation independent biology. Applying machine learning approaches to the data, more specifically Non-negative Matrix Factorization (NMF), we develop a prognostic model that captures clinically adverse, cryptic variation such as focal 5q deletions, missed by conventional chromosome analysis and thereby, adds novel, complementary information to traditional, cytogenetic based risk-stratification (e.g. ELN). The model also associates gain of chromosome 8 in Intermediate risk patients with worse outcome compared to other Intermediate risk patients (median OS = 186 compared to 446 days, Chi-square test pval=0.005) and identifies a subgroup of patients within the adverse risk category that lack deletions on chromosomes 5, 7, and 17, and that fare better than the rest of the adverse risk patients (median OS = 188 compared to 97.5 days, Chi-square test pval=0.02), likely reflecting previously identified “atypical” CK-AML patients. Last, we develop a prognostic classifier based on a combination of Neural-Network and Random-Forest based learning, that predicts disease outcome with high accuracy and is based on a three genomic feature list on chromosome bands 5q31.2, 7q31.33, and 8q24, suggesting potential for clinical implementation. Altogether, our results show that low coverage, sparse whole genome sequencing combined with advanced learning methods can be applied clinically to support the diagnosis and clinical management of AML disease as well as serve as a powerful tool to further a basic understanding of the genetics and biology of CNAs in myeloid neoplasms and other cancers.

This content is only available as a PDF.
Sign in via your Institution